Monterey
- North America > Canada > Quebec > Montreal (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- (9 more...)
RGMDT: Return-Gap-MinimizingDecisionTree ExtractioninNon-EuclideanMetricSpace
In this paper, we establish an upper bound on the return gap between the oracle expert policy and an optimal decision tree policy. This enables us to recast the DT extraction problem into a novel non-euclidean clustering problem over the local observation and action values space of each agent, with action values as cluster labels and the upper bound on the return gap as clustering loss.
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > United States > California > Monterey County > Monterey (0.04)
- Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
- Europe > Finland > Northern Savo > Kuopio (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.14)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- Asia > Macao (0.04)
- (25 more...)
- Information Technology > Security & Privacy (0.36)
- Government > Military (0.36)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.92)
- Information Technology > Data Science (0.68)
- North America > United States > Illinois > Champaign County > Urbana (0.04)
- North America > United States > California > Monterey County > Monterey (0.04)
- Europe > Germany > Baden-Württemberg > Karlsruhe Region > Heidelberg (0.04)
Parallelizing Linear Transformers with the Delta Rule over Sequence Length Songlin Y ang Bailin Wang Y u Zhang Yikang Shen Y oon Kim Massachusetts Institute of Technology Soochow University
Transformers with linear attention (i.e., linear transfor mers) and state-space models have recently been suggested as a viable linear-time alt ernative to transformers with softmax attention. However, these models still underp erform transformers especially on tasks that require in-context retrieval. Whil e more expressive variants of linear transformers which replace the additive upda te in linear transformers with the delta rule [DeltaNet; 101 ] have been found to be more effective at associative recall, existing algorithms for training such mode ls do not parallelize over sequence length and are thus inefficient to train on modern ha rdware. This work describes a hardware-efficient algorithm for training line ar transformers with the delta rule, which exploits a memory-efficient representati on for computing products of Householder matrices [ 11 ]. This algorithm allows us to scale up DeltaNet to standard language modeling settings. We train a 1.3B mode l for 100B tokens and find that it outperforms recent linear-time baselines su ch as Mamba [ 31 ] and GLA [ 124 ] in terms of perplexity and zero-shot performance on downst ream tasks. We also experiment with two hybrid models which combine Delt aNet layers with (1) sliding-window attention layers every other layer or (2) two global attention layers, and find that these hybrids outperform strong transf ormer baselines.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Africa > Rwanda > Kigali > Kigali (0.04)
- North America > United States > Maryland > Baltimore (0.04)
- (19 more...)
- Education (0.67)
- Health & Medicine (0.48)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.05)
- North America > Canada > British Columbia > Vancouver (0.04)
- Asia > South Korea > Daegu > Daegu (0.04)
- (16 more...)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- (14 more...)
- Africa > Rwanda > Kigali > Kigali (0.04)
- North America > United States > North Carolina > Durham County > Durham (0.04)
- North America > United States > New Mexico > Bernalillo County > Albuquerque (0.04)
- (13 more...)
- Health & Medicine (1.00)
- Information Technology (0.92)
- Banking & Finance > Economy (0.45)
- Europe > Austria > Vienna (0.14)
- North America > United States > California > Monterey County > Monterey (0.04)
- North America > United States > Maryland > Baltimore (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > China > Beijing > Beijing (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- (8 more...)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)